security risk
The Pentagon is planning for AI companies to train on classified data, defense official says
The generative AI models used in classified environments can answer questions but don't currently learn from the data they see. The Pentagon is discussing plans to set up secure environments for generative AI companies to train military-specific versions of their models on classified data, has learned. AI models like Anthropic's Claude are already used to answer questions in classified settings; applications include analyzing targets in Iran. But allowing models to train on and learn from classified data would be a new development that presents unique security risks. It would mean sensitive intelligence like surveillance reports or battlefield assessments could become embedded into the models themselves, and it would bring AI firms into closer contact with classified data than before. Training versions of AI models on classified data is expected to make them more accurate and effective in certain tasks, according to a US defense official who spoke on background with .
- Asia > Middle East > Iran (0.25)
- North America > United States > Massachusetts (0.05)
- Information Technology (1.00)
- Government > Military (1.00)
Is Vibe Coding Safe? Benchmarking Vulnerability of Agent-Generated Code in Real-World Tasks
Zhao, Songwen, Wang, Danqing, Zhang, Kexun, Luo, Jiaxuan, Li, Zhuo, Li, Lei
Vibe coding is a new programming paradigm in which human engineers instruct large language model (LLM) agents to complete complex coding tasks with little supervision. Although it is increasingly adopted, are vibe coding outputs really safe to deploy in production? To answer this question, we propose SU S VI B E S, a benchmark consisting of 200 feature-request software engineering tasks from real-world open-source projects, which, when given to human programmers, led to vulnerable implementations. We evaluate multiple widely used coding agents with frontier models on this benchmark. Disturbingly, all agents perform poorly in terms of software security. Although 61% of the solutions from SWE-Agent with Claude 4 Sonnet are functionally correct, only 10.5% are secure. Further experiments demonstrate that preliminary security strategies, such as augmenting the feature request with vulnerability hints, cannot mitigate these security issues. Our findings raise serious concerns about the widespread adoption of vibe-coding, particularly in security-sensitive applications.
- South America > Brazil (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Shadows in the Code: Exploring the Risks and Defenses of LLM-based Multi-Agent Software Development Systems
Wang, Xiaoqing, Huang, Keman, Liang, Bin, Li, Hongyu, Du, Xiaoyong
The rapid advancement of Large Language Model (LLM)- driven multi-agent systems has significantly streamlined software developing tasks, enabling users with little technical expertise to develop executable applications. While these systems democratize software creation through natural language requirements, they introduce significant security risks that remain largely unexplored. We identify two risky scenarios: Malicious User with Benign Agents (MU-BA) and Benign User with Malicious Agents (BU-MA). We introduce the Implicit Malicious Behavior Injection Attack (IMBIA), demonstrating how multi-agent systems can be manipulated to generate software with concealed malicious capabilities beneath seemingly benign applications, and propose Adv-IMBIA as a defense mechanism. Evaluations across ChatDev, MetaGPT, and AgentV erse frameworks reveal varying vulnerability patterns, with IMBIA achieving attack success rates of 93%, 45%, and 71% in MU-BA scenarios, and 71%, 84%, and 45% in BU-MA scenarios. Our defense mechanism reduced attack success rates significantly, particularly in the MU-BA scenario. Further analysis reveals that compromised agents in the coding and testing phases pose significantly greater security risks, while also identifying critical agents that require protection against malicious user exploitation. Our findings highlight the urgent need for robust security measures in multi-agent software development systems and provide practical guidelines for implementing targeted, resource-efficient defensive strategies.
- Asia > Thailand > Bangkok > Bangkok (0.05)
- Asia > China (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
MCIP: Protecting MCP Safety via Model Contextual Integrity Protocol
Jing, Huihao, Li, Haoran, Hu, Wenbin, Hu, Qi, Xu, Heli, Chu, Tianshu, Hu, Peizhao, Song, Yangqiu
As Model Context Protocol (MCP) introduces an easy-to-use ecosystem for users and developers, it also brings underexplored safety risks. Its decentralized architecture, which separates clients and servers, poses unique challenges for systematic safety analysis. This paper proposes a novel framework to enhance MCP safety. Guided by the MAESTRO framework, we first analyze the missing safety mechanisms in MCP, and based on this analysis, we propose the Model Contextual Integrity Protocol (MCIP), a refined version of MCP that addresses these gaps. Next, we develop a fine-grained taxonomy that captures a diverse range of unsafe behaviors observed in MCP scenarios. Building on this taxonomy, we develop benchmark and training data that support the evaluation and improvement of LLMs' capabilities in identifying safety risks within MCP interactions. Leveraging the proposed benchmark and training data, we conduct extensive experiments on state-of-the-art LLMs. The results highlight LLMs' vulnerabilities in MCP interactions and demonstrate that our approach substantially improves their safety performance.
- Europe > Austria > Vienna (0.14)
- Asia > China > Hong Kong (0.04)
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
Tricking LLM-Based NPCs into Spilling Secrets
Shiomi, Kyohei, Lian, Zhuotao, Nakanishi, Toru, Kitasuka, Teruaki
Large Language Models (LLMs) are increasingly used to generate dynamic dialogue for game NPCs. However, their integration raises new security concerns. In this study, we examine whether adversarial prompt injection can cause LLM-based NPCs to reveal hidden background secrets that are meant to remain undisclosed.
Unveiling the Landscape of LLM Deployment in the Wild: An Empirical Study
Hou, Xinyi, Han, Jiahao, Zhao, Yanjie, Wang, Haoyu
--Large language models (LLMs) are increasingly deployed through open-source and commercial frameworks, enabling individuals and organizations to self-host advanced LLM capabilities. As LLM deployments become prevalent, particularly in industry, ensuring their secure and reliable operation has become a critical issue. However, insecure defaults and miscon-figurations often expose LLM services to the public internet, posing serious security and system engineering risks. This study conducted a large-scale empirical investigation of public-facing LLM deployments, focusing on the prevalence of services, exposure characteristics, systemic vulnerabilities, and associated risks. Through internet-wide measurements, we identified 320,102 public-facing LLM services across 15 frameworks and extracted 158 unique API endpoints, categorized into 12 functional groups based on functionality and security risk. Our analysis found that over 40% of endpoints used plain HTTP, and over 210,000 endpoints lacked valid TLS metadata. API exposure was highly inconsistent: some frameworks, such as Ollama, responded to over 35% of unauthenticated API requests, with about 15% leaking model or system information, while other frameworks implemented stricter controls. We observed widespread use of insecure protocols, poor TLS configurations, and unauthenticated access to critical operations. These security risks, such as model leakage, system compromise, and unauthorized access, are pervasive and highlight the need for a secure-by-default framework and stronger deployment practices. Driven by renowned models like OpenAI's GPT series [33] and DeepSeek's open-source variant [9], large language models (LLMs) are rapidly gaining popularity and profoundly reshaping a wide range of applications. Once primarily confined to research labs and industrial environments, these models are now not only continuously deployed in-depth within the industry, but are also gradually opened to the wider public, promoting the vigorous development of self-hosted and open source deployment. The emergence of user-friendly tools and a vibrant community ecosystem [42], [43], [38] has enabled individual enthusiasts, small enterprises, and developers to independently deploy and customize powerful LLMs for a variety of personal and professional needs, such as creative writing and content creation [20], software development and maintenance [16], financial analysis and automated investment assistance [48], and personal productivity tools, significantly enriching their daily digital experiences.
- North America > United States > California > San Diego County > San Diego (0.04)
- Europe > Germany (0.04)
- Asia > Singapore (0.04)
- (2 more...)
LLLMs: A Data-Driven Survey of Evolving Research on Limitations of Large Language Models
Kostikova, Aida, Wang, Zhipin, Bajri, Deidamea, Pütz, Ole, Paaßen, Benjamin, Eger, Steffen
Large language model (LLM) research has grown rapidly, along with increasing concern about their limitations such as failures in reasoning, hallucinations, and limited multilingual capability. While prior reviews have addressed these issues, they often focus on individual limitations or consider them within the broader context of evaluating overall model performance. This survey addresses the gap by presenting a data-driven, semi-automated review of research on limitations of LLMs (LLLMs) from 2022 to 2025, using a bottom-up approach. From a corpus of 250,000 ACL and arXiv papers, we extract 14,648 relevant limitation papers using keyword filtering and LLM-based classification, validated against expert labels. Using topic clustering (via two approaches, HDBSCAN+BERTopic and LlooM), we identify between 7 and 15 prominent types of limitations discussed in recent LLM research across the ACL and arXiv datasets. We find that LLM-related research increases nearly sixfold in ACL and nearly fifteenfold in arXiv between 2022 and 2025, while LLLMs research grows even faster, by a factor of over 12 in ACL and nearly 28 in arXiv. Reasoning remains the most studied limitation, followed by generalization, hallucination, bias, and security. The distribution of topics in the ACL dataset stays relatively stable over time, while arXiv shifts toward safety and controllability (with topics like security risks, alignment, hallucinations, knowledge editing), and multimodality between 2022 and 2025. We offer a quantitative view of trends in LLM limitations research and release a dataset of annotated abstracts and a validated methodology, available at: https://github.com/a-kostikova/LLLMs-Survey.
- Europe > Germany > Bavaria > Middle Franconia > Nuremberg (0.04)
- North America > United States > Virginia (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report > New Finding (1.00)
- Overview (1.00)
- Research Report > Experimental Study (0.67)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
On the Security Risks of ML-based Malware Detection Systems: A Survey
He, Ping, Mao, Yuhao, Li, Changjiang, Cavallaro, Lorenzo, Wang, Ting, Ji, Shouling
Malware presents a persistent threat to user privacy and data integrity. To combat this, machine learning-based (ML-based) malware detection (MD) systems have been developed. However, these systems have increasingly been attacked in recent years, undermining their effectiveness in practice. While the security risks associated with ML-based MD systems have garnered considerable attention, the majority of prior works is limited to adversarial malware examples, lacking a comprehensive analysis of practical security risks. This paper addresses this gap by utilizing the CIA principles to define the scope of security risks. We then deconstruct ML-based MD systems into distinct operational stages, thus developing a stage-based taxonomy. Utilizing this taxonomy, we summarize the technical progress and discuss the gaps in the attack and defense proposals related to the ML-based MD systems within each stage. Subsequently, we conduct two case studies, using both inter-stage and intra-stage analyses according to the stage-based taxonomy to provide new empirical insights. Based on these analyses and insights, we suggest potential future directions from both inter-stage and intra-stage perspectives.
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > United States > New York > Suffolk County > Stony Brook (0.04)
- Asia > Nepal (0.04)
- (3 more...)
- Research Report (1.00)
- Overview (0.92)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
A Rusty Link in the AI Supply Chain: Detecting Evil Configurations in Model Repositories
Ding, Ziqi, Fu, Qian, Ding, Junchen, Deng, Gelei, Liu, Yi, Li, Yuekang
Recent advancements in large language models (LLMs) have spurred the development of diverse AI applications from code generation and video editing to text generation; however, AI supply chains such as Hugging Face, which host pretrained models and their associated configuration files contributed by the public, face significant security challenges; in particular, configuration files originally intended to set up models by specifying parameters and initial settings can be exploited to execute unauthorized code, yet research has largely overlooked their security compared to that of the models themselves; in this work, we present the first comprehensive study of malicious configurations on Hugging Face, identifying three attack scenarios (file, website, and repository operations) that expose inherent risks; to address these threats, we introduce CONFIGSCAN, an LLM-based tool that analyzes configuration files in the context of their associated runtime code and critical libraries, effectively detecting suspicious elements with low false positive rates and high accuracy; our extensive evaluation uncovers thousands of suspicious repositories and configuration files, underscoring the urgent need for enhanced security validation in AI model hosting platforms.
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.90)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
ControlNET: A Firewall for RAG-based LLM System
Yao, Hongwei, Shi, Haoran, Chen, Yidou, Jiang, Yixin, Wang, Cong, Qin, Zhan
Retrieval-Augmented Generation (RAG) has significantly enhanced the factual accuracy and domain adaptability of Large Language Models (LLMs). This advancement has enabled their widespread deployment across sensitive domains such as healthcare, finance, and enterprise applications. RAG mitigates hallucinations by integrating external knowledge, yet introduces privacy risk and security risk, notably data breaching risk and data poisoning risk. While recent studies have explored prompt injection and poisoning attacks, there remains a significant gap in comprehensive research on controlling inbound and outbound query flows to mitigate these threats. In this paper, we propose an AI firewall, ControlNET, designed to safeguard RAG-based LLM systems from these vulnerabilities. ControlNET controls query flows by leveraging activation shift phenomena to detect adversarial queries and mitigate their impact through semantic divergence. We conduct comprehensive experiments on four different benchmark datasets including Msmarco, HotpotQA, FinQA, and MedicalSys using state-of-the-art open source LLMs (Llama3, Vicuna, and Mistral). Our results demonstrate that ControlNET achieves over 0.909 AUROC in detecting and mitigating security threats while preserving system harmlessness. Overall, ControlNET offers an effective, robust, harmless defense mechanism, marking a significant advancement toward the secure deployment of RAG-based LLM systems.
- Europe > Austria > Vienna (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Asia > South Korea (0.14)
- (17 more...)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (1.00)